Channel and spatial attention mechanism for fashion image captioning

نویسندگان

چکیده

<div class="page" title="Page 1"><div class="layoutArea"><div class="column"><p><span lang="EN-US">Image captioning aims to automatically generate one or more description sentences for a given input image. Most of the existing methods use encoder-decoder model which mainly focus on recognizing and capturing relationship between objects appearing in However, when generating captions fashion images, it is important not only describe items their relationships, but also mention attribute features clothes (shape, texture, style, fabric, more). In this study, novel proposed image task can capture relationship, features. Two different attention mechanisms (spatial-attention channel-wise attention) incorporated traditional model, dynamically interprets caption sentence multi-layer feature map addition depth dimension map. We evaluate our architecture Fashion-Gen using three metrics (CIDEr, ROUGE-L, BLEU-1), achieve scores 89.7, 50.6 45.6, respectively. Based experiments, method shows significant performance improvement fashion-image captioning, outperforms other state-of-the-art methods.</span></p></div></div></div>

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Captioning with Attention

In the past few years, neural networks have fueled dramatic advances in image classi cation. Emboldened, researchers are looking for more challenging applications for computer vision and arti cial intelligence systems. They seek not only to assign numerical labels to input data, but to describe the world in human terms. Image and video captioning is among the most popular applications in this t...

متن کامل

Image Captioning using Visual Attention

This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...

متن کامل

Text-Guided Attention Model for Image Captioning

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...

متن کامل

Social Image Captioning: Exploring Visual Attention and User Attention

Image captioning with a natural language has been an emerging trend. However, the social image, associated with a set of user-contributed tags, has been rarely investigated for a similar task. The user-contributed tags, which could reflect the user attention, have been neglected in conventional image captioning. Most existing image captioning models cannot be applied directly to social image ca...

متن کامل

Attention Correctness in Neural Image Captioning

Attention Map Visualization We visualize the attention maps of both the implicit attention model and our supervised attention model on the Flickr30k test set. As mentioned in the paper, 909 noun phrases are aligned for the implicit model and 901 for the supervised model. 635 of these alignments are common for both, and 595 of them have corresponding bounding boxes. Here we present a subset due ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Power Electronics and Drive Systems

سال: 2023

ISSN: ['2722-2578', '2722-256X']

DOI: https://doi.org/10.11591/ijece.v13i5.pp5833-5842